The Design and Implementation of the Parallel Out - of - coreScaLAPACK

نویسنده

  • E. F. D'Azevedo
چکیده

This paper describes the design and implementation of three core factorization routines | LU, QR and Cholesky | included in the out-of-core extension of ScaLAPACK. These routines allow the factorization and solution of a dense system that is too large to t entirely in physical memory. An image of the full matrix is maintained on disk and the factorization routines transfer sub-matrices into memory. Thèleft-looking' column-oriented variant of the factorization algorithm is implemented to reduce the disk I/O traac. The routines are implemented using a portable I/O interface and utilize high performance ScaLAPACK factorization routines as in-core computational kernels. We present the details of the implementation for the out-of-core ScaLAPACK factoriza-tion routines, as well as performance and scalability results on the Intel Paragon. for the use of the computing facilities.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modified 32-Bit Shift-Add Multiplier Design for Low Power Application

Multiplication is a basic operation in any signal processing application. Multiplication is the most important one among the four arithmetic operations like addition, subtraction, and division. Multipliers are usually hardware intensive, and the main parameters of concern are high speed, low cost, and less VLSI area. The propagation time and power consumption in the multiplier are always high. ...

متن کامل

Design and Implementation of a High Speed Systolic Serial Multiplier and Squarer for Long Unsigned Integer Using VHDL

A systolic serial multiplier for unsigned numbers is presented which operates without zero words inserted between successive data words, outputs the full product and has only one clock cycle latency. The multiplier is based on a modified serial/parallel scheme with two adjacent multiplier cells. Systolic concept is a well-known means of intensive computational task through replication of func...

متن کامل

Design and Implementation of a High Speed Systolic Serial Multiplier and Squarer for Long Unsigned Integer Using VHDL

A systolic serial multiplier for unsigned numbers is presented which operates without zero words inserted between successive data words, outputs the full product and has only one clock cycle latency. &#10The multiplier is based on a modified serial/parallel scheme with two adjacent multiplier cells. Systolic concept is a well-known means of intensive computational task through replication of fu...

متن کامل

Implementation of the direction of arrival estimation algorithms by means of GPU-parallel processing in the Kuda environment (Research Article)

Direction-of-arrival (DOA) estimation of audio signals is critical in different areas, including electronic war, sonar, etc. The beamforming methods like Minimum Variance Distortionless Response (MVDR), Delay-and-Sum (DAS), and subspace-based Multiple Signal Classification (MUSIC) are the most known DOA estimation techniques. The mentioned methods have high computational complexity. Hence using...

متن کامل

The Multidisciplinary Design Optimization of a Reentry Vehicle Using Parallel Genetic Algorithms

The purpose of this paper is to examine the multidisciplinary design optimization (MDO) of a reentry vehicle. In this paper, optimization of a RV based on, minimization of heat flux integral and minimization of axial force coefficient integral and maximization of static margin integral along reentry trajectory is carried out. The classic optimization methods are not applicable here due to the c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997